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Abstract 

This paper describes a personalized ^-anonymity model for 
protecting location privacy against various privacy threats 
through location information sharing. Our model has two 
unique features. First, we provide a unified privacy person- 
alization framework to support location k-anonymity for a 
wide range of users with context-sensitive personalized pri- 
vacy requirements. This framework enables each mobile 
node to specify the minimum level of anonymity it desires 
as well as the maximum temporal and spatial resolutions it 
is willing to tolerate when requesting for ^-anonymity pre- 
serving location-based services (LBSs). Second, we devise 
an efficient message perturbation engine which runs by the 
location protection broker on a trusted server and performs 
location anonymization on mobile users' LBS request mes- 
sages, such as identity removal and spatio-temporal cloak- 
ing of location information. We develop a suite of scalable 
and yet efficient spatio-temporal cloaking algorithms, called 
CliqueCloak algorithms, to provide high quality personal- 
ized location /c-anonymity, aiming at avoiding or reducing 
known location privacy threats before forwarding requests to 
LBS provider(s). The effectiveness of our CliqueCloak al- 
gorithms is studied under various conditions using realistic 
location data synthetically generated using real road maps and 
traffic volume data. 

1 Introduction 

Advances in sensing and tracking technologies create new 
opportunities for location-based applications but they also 
create significant privacy risks. According to the report 
by Computer Science and Telecommunications Board on IT 
Roadmap to a Geospatial Future [2], location based services 
(LBSs) are expected to form an important part of the fu- 
ture computing environments that will seamlessly and ubiqui- 
tously integrate into our life (examples include NextBus [8], 
CyberGuide [1], or FCC's Phase II E911 Rules). Although 
with LBSs mobile users can obtain wide variety of location- 
based information services, and businesses can extend their 
competitive edges in mobile commerce and ubiquitous ser- 
vice provisions, extensive deployment of location based ser- 
vices may open doors for adversaries to endanger location 
privacy of mobile users and to expose LBSs to significant 



vulnerabilities for abuse [16]. Location privacy threats de- 
scribe the risk that an adversary learns the locations that a 
subject visited, as well as times during which these visits took 
place. Through these locations, the adversary can receive 
clues about private information such as political affiliations, 
alternative lifestyles, or medical problems. The two classes 
of most popular privacy threats to LBSs are communication 
privacy threats, exemplified by the passive-logging based at- 
tacks, and location privacy threats, represented by space or 
time correlated inference attacks. Even when a subject does 
not disclose her identity at a private location, an adversary 
may still gain this information through location tracking or 
space and time correlation inference. In case that a subject is 
identified at any point, her complete movements can also be 
exposed. 

One way to reduce location privacy risks is to promote 
^-anonymity preserving management of location informa- 
tion and develop efficient and scalable system-level facilities 
for protecting location privacy with location fc-anonymity. 
Anonymity can be seen as "a state of being not identifiable 
within a set of subjects, the anonymity set" [9]. The concept 
of fc-anonymity is originally introduced in the context of re- 
lational data privacy research [11]. It addresses the question 
of "How can a data holder release a version of its private data 
with scientific guarantees that the individuals who are the sub- 
jects of the data cannot be re-identified while the data remain 
practically useful" [14]. 

In the context of LBSs and mobile users, location k- 
anonymity refers to /c-anonymous usage of location informa- 
tion. A subject is considered ^-anonymous with respect to 
location information if and only if the location information 
sent from a mobile user to a LBS is indistinguishable from 
the location information of at least k — 1 other subjects (e.g. 
different mobile nodes) [6]. Location perturbation is an ef- 
fective technique in dealing with location privacy breaches 
exemplified by the above cases and is effective for support- 
ing location fc-anonymity. If the location information sent by 
each mobile node is perturbed by replacing the position of 
the mobile node with a coarser grained spatial range, such 
that there are several other mobile nodes within that range, 
say k of them, then the adversary will have uncertainty in 
matching the mobile node to a location-identity association 
she has obtained through external observation or knowledge. 
This uncertainty will increase with the increasing value of k, 



providing better privacy. 

In this paper, we describe a personalized ^-anonymity 
model for protecting location privacy against various privacy 
threats through location information sharing. There is a close 
synergy between location privacy and /c-anonymity. Larger 
k in location anonymity usually implies higher guarantees 
for location privacy. Therefore, to ensure that a subject is 
k anonymous, one can perturb the location information by re- 
placing it with a relatively large spatial region (range) or by 
delaying the message long enough. However, this has two 
downsides. First, low spatial resolution in location pertur- 
bation may lead the LBS provider to provide more coarse 
grained location-dependent information to the mobile user, 
which may deteriorate the quality of service; or it may result 
in sending more than required information back to the mobile 
user, which is going to be filtered out by the mobile node, re- 
sulting in communication and processing overhead. Second, 
the extra delay introduced through temporal cloaking of loca- 
tion information may decrease the perceived service quality 
of the mobile user. 

The development of our location privacy model exhibits 
two distinct features. First, it enables each mobile node to 
specify the minimum level of anonymity it desires as well 
as the maximum temporal and spatial resolutions it is will- 
ing to tolerate when requesting for /c-anonymity preserving 
location-based services. Concretely, instead of imposing a 
uniformed k for all mobile users, we provide efficient algo- 
rithms and system level facilities to support personalized k at 
per-user level. Each user can specify a different fc-anonymity 
level, and can change this specification at per-message gran- 
ularity. Furthermore, each user can specify her preferred spa- 
tial and temporal tolerance values that should to be respected 
while maintaining the desired level of location /c-anonymity. 
We call such tolerance specification (service quality) and 
preference of k value (location privacy), the anonymization 
constraint of the message. By providing a unified framework 
to support location /c-anonymity with variable anonymization 
constraints, we allow a wide range of users to benefit from 
the location privacy protection with personalized privacy and 
quality requirements. 

Second, we devise an efficient message perturbation en- 
gine which runs by the location protection broker on a trusted 
server and performs location anonymization on mobile users' 
LBS request messages, such as identity removal and spatio- 
temporal cloaking of location information. We develop a suite 
of scalable and yet efficient spatio-temporal cloaking algo- 
rithms, called CliqueCloak algorithms, taking into account 
the tradeoffs between location privacy and quality of service. 
Our location perturbation engine can continuously process a 
stream of messages for location /c-anonymity, and can work 
with different CliqueCloak algorithms to perturb the loca- 
tion information contained in the messages sent from mobile 
users by performing spatio-temporal cloaking. The resulting 
three dimensional box, called the spatio-temporal cloaking 
box, indicates the acceptable decrease of the spatial resolution 



of location information and the tolerable delay of the mes- 
sage in an effort to meet the specified anonymity level. Our 
experiments show that the proposed personalized location k- 
anonymity model, through the use of our perturbation engine 
and its CliqueCloak algorithms, can achieve high guarantee 
of ^-anonymity and high resilience to location privacy threats 
without introducing significant performance penalty. 

2 Personalized Location k-anonymity 

We assume that the LBS system consists of mobile nodes, 
a wireless network, anonymity servers, and LBS servers. Lo- 
cation information is typically determined by a location in- 
formation source, such as GPS receiver in a vehicle. We as- 
sume that location information includes temporal information 
(when the subject was present at the location) in addition to 
spatial information. Mobile nodes communicate with third 
party LBS providers through one or a collection of anonymity 
servers located at trusted computing bases. The mobile nodes 
establish communication with an anonymity server through 
an authenticated and encrypted connection. Each message 
destined to an LBS provider contains location information of 
the mobile node, a timestamp, in addition to service specific 
information. Upon receiving a message from a mobile node, 
the anonymity server decrypts the message and removes any 
identifiers, such as IP addresses, and perturbs the location 
information through spatio-temporal cloaking, and then for- 
wards the anonymized message to the LBS provider. 

In order to capture varying location privacy requirements 
and ensure different levels of service quality, each mobile 
node specifies its anonymity level (k value), spatial toler- 
ance, and temporal tolerance. The main task of a loca- 
tion anonymity server is to transform each message received 
from mobile nodes into a new message that can be safely (k- 
anonymously) forwarded to the LBS provider. The key idea 
underlying the location ^-anonymity model is two-fold. First, 
a given degree of location anonymity can be maintained, re- 
gardless of population density, by decreasing the location ac- 
curacy through enlarging the exposed spatial area, such that 
there are other k — 1 mobile nodes present in the same spatial 
area. This approach is called spatial cloaking. Second, one 
can achieve the location anonymity by delaying the message 
until k mobile nodes have visited the same area located by the 
message sender. This approach is called temporal cloaking. 

We denote the set of messages received from the mobile 
nodes as S. We formally define a messages m s in the set S 
as follows: (u id , r noj {t, x, y}, k, {d t ,d x ,d y }, C). Messages 
are uniquely identifiable by the sender's identifier, message 
reference number pairs, (u id , r no ), within the set S. Messages 
from the same mobile node have same sender identifiers but 
different reference numbers. In a received message, x, y, and 
t together form the three dimensional spatio-temporal loca- 
tion point of the message, denoted as L(m s ). The coordi- 
nate (x,y) refers to the spatial position of the mobile node in 
the two dimensional space (i.e., x-axis and ?/-axis), and the 
timestamp t refers to the time point at which the mobile node 
was present at that position (temporal dimension: t-axis of 



the message). The k value of the message specifies the de- 
sired minimum anonymity level. A value of k = 1 means 
that anonymity is not required for the message. A value of 
k > 1 means that the perturbed message will be assigned a 
spatio-temporal cloaking box that is indistinguishable from 
at least k — 1 other perturbed messages, each from a differ- 
ent mobile node. Thus, larger k values imply higher degree 
of privacy. One way to determine the appropriate k value is 
to assess the certainty with which an adversary can associate 
the message with an external location/identity binding. This 
certainty is given by, 1/k. This means that P% privacy re- 
quires to set the k value to be (1 — P/100) -1 . The d t value 
of the message represents the temporal tolerance specified by 
the user. It means that, the perturbed message should have 
a spatio-temporal cloaking box whose projection on the tem- 
poral dimension does not contain any point more than d t dis- 
tance away from t. Similarly, d x and d y specify the toler- 
ances with respect to the spatial dimensions. The values of 
these three parameters are dependent on the requirements of 
the external LBS and users' preferences with regard to quality 
of service. For instance, larger spatial tolerances may result in 
less accurate answers to location-dependent service requests 
and larger temporal tolerances may result in higher latencies 
of the messages. Let &(v,d) = [v — d, v + d] be a func- 
tion that extends a numerical value v to a range by amount 
d. Then, we denote the spatio-temporal constraint box of a 
message m s as B cn (m s ) and define it as ($(m s .x,m 3 .d x ), 
$(m s .y,m s .d y ), <&(m s .t,m s .d t )). The field C in m s de- 
notes the message content. 

We denote the set of perturbed (anonymized) messages as 
T. We formally define a messages m t in the set T as fol- 
lows: {u id ,r no ,{X : [x 3 ,x e ],Y : [y 3 ,y e ],I : [t 3 ,t e ]},C). 
For each message m s in S, there exists at most one corre- 
sponding message m t in T. We call the message m t , the per- 
turbed format of message m s , denoted as m t = R(m s ). The 
function R defines a one-to-one and onto mapping from S to 
T. Concretely, if m t = R(m s ), then m t .Uid = m s .Uid and 
^t-^no = m s .r no . If R(m s ) = 0, then the message m s is not 
anonymized. The (iiid, r no ) fields of a message in T should 
be replaced with a dummy identifier (e.g., with h(uid\\r no ), 
where h is a secure hash function) before the message can be 
safely forwarded to the LBS provider. In a perturbed mes- 
sage, X : [x s ,x e ] denotes the extent of the spatio-temporal 
cloaking box of the message on the x-axis, with x s and x e 
denoting the two end points of the interval. The definitions 
of Y : [y 3 , y e ] and / : [t 3 ,t e ] are similar with ?/-axis and t- 
axis replacing the x-axis, respectively. We denote the spatio- 
temporal cloaking box of a perturbed message as B c \ (m t ) and 
define it as (m t .X : [x 3 , x e ], m t .Y : [y s ,y e ],m t .I : [t 8 ,t e ]). 
The field C in m t denotes the message content. We now de- 
scribe how the fields of a perturbed message in set T relates 
to its counterpart in set S. 

There are three basic properties that must hold between a 
raw message m s in S and its perturbed format m t in T. These 
are: (/) Spatio-temporal Containment, which states that the 



cloaking box B c i(m t ) of the perturbed message should con- 
tain the spatio-temporal point L(m s ) of the original message 
m s . (ii) Spatio-temporal Resolution, which states that for 
each of the three dimensions, the extent of the spatio-temporal 
cloaking box of the perturbed message on that dimension 
should be contained within the interval defined by the maxi- 
mum tolerance value specified in the original message. This 
is equivalent to stating that the cloaking box B c i(m t ) of 
the perturbed message, should be contained within the con- 
straint box B cn (m s ) of the original message m s . (Hi) Content 
Preservation, which ensures that the message content remains 
as it is, i.e. m s .C = m t .C. 

We formally capture the essence of the location k- 
anonymity by the following requirement, which states that, 
for a message m s in S and its perturbed format m t in T, the 
following condition must hold: 

- Location k- anonymity: 

3V c T, s.t. m t e T', \V\ > m s .k, 

V{m ti ,m tj }cT', rn ti .u id ^ m tj .Uid and 

Vm ti £T',B c i(m ti ) = B c i(m t ) 

The ^-anonymity requirement demands that, for each per- 
turbed message m t = R(m s ), there exist at least m s .k — 
1 other perturbed messages with the same spatio-temporal 
cloaking box, each from a different mobile node. A key chal- 
lenge for the cloaking algorithms employed by the message 
perturbation engine is to find a set of messages within a mini- 
mal spatio-temporal cloaking box that satisfies the above con- 
ditions. 

3 Message Perturbation Engine 

The message perturbation engine processes each incoming 
message m s from mobile nodes in four steps. The first step, 
called zoom-in, involves locating a subset of all messages cur- 
rently pending in the engine. This subset contains messages 
that are potentially useful for anonymizing the newly received 
message m s . The second step, called detection, is responsi- 
ble for finding the particular group of messages within the set 
of messages located in the zoom-in step, such that this group 
of messages can be anonymized together with the newly re- 
ceived message m s . If such a group of messages is found, 
then the perturbation is performed over these messages in the 
third step, called perturbation, and the perturbed messages 
are forwarded to the LBS provider. The last step, called ex- 
piration, checks for pending messages whose deadlines has 
passed, and thus are dropped. The deadline of a message 
is the high point along the temporal dimension of its spatio- 
temporal constraint box. 

3.1 Message Anonymization 

A main technical challenge for developing an efficient 
cloaking algorithm is to find the smallest spatio-temporal 
cloaking box, for each message m s G S, within its speci- 
fied spatial and temporal tolerances, such that there exist at 
least m s .k — 1 other messages, each from a different mobile 
node, with the same minimal cloaking box. Let us consider 



this problem in two steps (in reverse order): (1) given a set 
M of messages that can be anonymized together, how to find 
the minimal cloaking box in which all messages in M reside; 
and (2) for a message m s G S, how to find the set M contain- 
ing m s and the group of messages that can be anonymized 
together with m s . A set M C S of messages are said to be 
anonymized together if they are assigned the same cloaking 
box and all the requirements defined in Section 2 are satisfied 
for all messages in M. 

Consider a set M c S of messages that can be 
anonymized together. The best strategy to find a minimal 
cloaking box for all messages in M is to use the minimum 
bounding rectangle (MBR l ) of the spatio-temporal points of 
the messages in M as the minimal cloaking box. This defi- 
nition of minimal cloaking box also ensures that the cloaking 
box is contained in the constraint boxes of all other messages 
in M. We denote the minimum spatio-temporal cloaking box 
of a set M C S of messages that can be anonymized together 
as B m (M) , and define it to be equal to the MBR of the points 
in the set {L(m s )\m s G M}. 

Now let us consider the second step: given a message 
m s G S, how to find the set M containing m s and the 
group of messages that can be anonymized together with 
m s . Based on the above analysis and observations, one way 
to tackle this problem is to model the anonymization con- 
straints of all messages in S as a constraint graph defined 
below and translate the problem into the problem of finding 
cliques that satisfy certain conditions in the constraint graph: 
Let G(S,E) be an undirected graph where S is the set of 
vertices, each representing a message received at the mes- 
sage perturbation engine, and E is the set of edges. There 
exists an edge e = (m Si , m Sj ) G E between two vertices 
m s . and m Sj , if and only if the following conditions hold: 
(0 L(m s .) G B cn (m Sj ), (ii) L(m Sj ) G B cn (m Si ), (Hi) 
m Si .u i( i 7^ m Sj .Ui^ We call this graph the constraint graph. 
The conditions (i), (ii), and (iii) together state that, two mes- 
sages are connected in the constraint graph if and only if 
they originate from different mobile nodes and their spatio- 
temporal points are contained in each other's constraint boxes 
defined by their tolerance values. 

Given the definition of constraint graph, the following 
property holds: Let M = {m Sl , m S2 , . . . , m Sl } be a set 
of messages in S. For each message m s . in M, we de- 
fine m t . = (m 8i Mid,m 8i .r no ,B rn (M),m 8 ..C). Then m u , 
1 < i < I, is a valid perturbed format of m Si if and only if 'the 
set M of messages form an /-clique in the constraint graph 
G(S, E) with the additional condition that for any message 
m s . in S, we have m Si .k < I (i.e. m s .'s user specified k 
value is not larger than the cardinality of the set M). See 
our technical report [4] for a formal theorem (refered to as 
ClickCloak Theorem) governing this property. 

We demonstrate the application of this property with an 
example. Figure 1 shows four messages, mi, m^, ms, and 
7774. We assume that each message is from a different mobile 

^BR of a set of points is the smallest rectangular region enclosing all the points 



node. We omitted the time domain in this example for ease 
of explanation, but the extension to spatio-temporal space is 
straightforward. Initially, first three of these messages are in- 
side the system. Spatial layout I shows how these three mes- 
sages spatially relate to each other. It also depicts the spatial 
constraint boxes of the messages. Constraint graph I shows 
how these messages are connected to each other in the con- 
straint graph. Since the spatial locations of messages mi and 
?7i2 are mutually contained in each others spatial constraint 
box, they are connected in the constraint graph and ms lies 
apart by itself. Although mi and 7772 form a 2-clique, they 
can not be anonymized and removed from the graph. This is 
because m^.k = 3 and as a result the clique does not satisfy 
the Clique-Cloak theorem. Spatial layout II shows the situa- 
tion after 7774 arrives and constraint graph II shows the corre- 
sponding status of the constraint graph. With the inclusion of 
777,4, there exists only one clique whose size is at least equal 
to the maximum k value of the messages it contains. This 
clique is {mi, 7772, 7774}. We can compute the MBR of the 
messages within the clique and use it as the spatio-temporal 
cloaking box of the perturbed messages and then safely re- 
move this clique. Figure 1(b) clearly shows that the MBR 
is contained by the spatial constraint boxes of all messages 
within the clique. 

Although in the described example we have found a sin- 
gle clique immediately after 7774 was received, we could have 
had cliques of different sizes to choose from. For instance, if 

7774. k was 2, then {7773, 7774} would have also formed a valid 
clique according to the Clique-Cloak theorem. We address 
the questions of what kind of cliques to search and when to 
search for such cliques, in more detail in Section 4. 

There are three key points that makes the application of 
Clique-Cloak theorem effective in practice: (i) Successful 
anonymization of a message 777 Sc results in anonymization 
of at least m Sc .k — 1 other messages, (ii) The search per- 
formed on the constraint graph for the purpose of anonymiz- 
ing a message tt7 Sc only deals with a small subgraph that con- 
sists of 777 Sc and its neighbors (illustrated in Figure 1 for 7774, 

7775, and 7776), and the cost of this step does not depend on 
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Figure 1 : Illustration of the Clique-Cloak Algorithm 



the scale of the complete constraint graph, (iii) For messages 
whose subgraphs on which the search is performed do not 
share any messages (exemplified by 777,4, ^5, and ?776 in Fig- 
ure 1 (e)), the anonymization process can be efficiently par- 
allelized. When overlaps exist, simple locking strategies can 
be employed to achieve effective parallelization (See [4]). 

3.2 Data Structures 

We briefly describe the four main data structures that are 
used in the message perturbation engine. 

Message Queue, Q m \ Message queue is a simple FIFO 
queue, which collects the messages sent from the mobile 
nodes in the order they are received. The messages are 
popped from this queue by the message perturbation engine 
in order to be processed. 

Multi-dimensional Index, I m : The multi-dimensional in- 
dex is used to allow efficient search on the spatio-temporal 
points of the messages. For each message, say m s , in the 
set of messages that are not yet anonymized and are not yet 
dropped according to expiration condition (specified by the 
temporal tolerance), I m contains a three dimensional point 
L(m s ) as a key, together with the message m s as data. The 
index is implemented using an in-memory R*-tree in our sys- 
tem. 

Constraint Graph, G m : The constraint graph is a dynamic 
in-memory graph, which contains the messages that are not 
yet anonymized and not yet dropped due to expiration. The 
structure of the constraint graph is already defined in Sec- 
tion 3.1. The multi-dimensional index I m is mainly used to 
speedup the maintenance of the constraint graph G m , which 
is updated when new messages arrive or when messages get 
anonymized or expired. 

Expiration Heap, H m : Expiration heap is a mean-heap, 
sorted based on the deadline of the messages. For each 
message, say m s , in the set of messages that are not yet 
anonymized and are not yet dropped due to expiration, H m 
contains a deadline m s .t + m s .d t as the key, together with 
the message m s as the data. Expiration heap is used to detect 
expired messages (i.e. messages that cannot be successfully 
anonymized), so that they can be dropped and removed from 
the system. 

3.3 Perturbation Engine Algorithms 

Upon arrival of a new message, the message engine will 
update the message queue (FIFO) to include this message. 
The message perturbation process works by continuously 
popping messages from the message queue and processing 
them for ^-anonymity in four steps. The pseudo code of the 
message perturbation engine is given in Algorithm 1 . 

Zoom-in — In this step we update the data structures with 
the new message from the message queue, and integrate the 
new message into the constraint graph, i.e., search the con- 
straint graph containing all the messages pending for pertur- 
bation and locate the messages that should be assigned as 
neighbors to it in the graph (zoom-in). Concretely, when 
a message m Sc is popped from the message queue, it is in- 



Algorithm 1 : Message Perturbation Engine 

MsgPertEngine() 
(1) while true 



(2) ifQ m #0 

(3) 171 s c <— Pop the first item in Q m 

(4) Add m Sc into I m with L(m Sc ) 

(5) Add m Sc into H m with (m Sc .t + m Sc .d t ) 

(6) Add the message m Sc into G m as a node 

(7) N <— Range search J m using B cn (m Sc ) 

(8) foreach m s £ N, m s ^ m Sc 

(9) if L(m Sc ) £ B cn (m s ) 

(10) Add the edge (m Sc , m s ) into G m 

(11) G m <— Subgraph of Gm consisting of messages in N 

(12) M <— LOCAL- fc_SEARCH(m Sc .k, 17l Sc , G m ) 

(13) ifM/0 

(14) foreach m s in M 

(15) Output perturbed message <— 

(16) (h(m s .u id \\m s .r no ) , B m (M), m s .C) 

(17) Remove the message m s from G m 

(18) Remove the message m s from I m 

(19) Pop the topmost element in H m 

(20) while true 

(21) m s <— Topmost item in H m 

(22) if m s .t + m s .dt < now 

(23) Remove the message m 8 from G m 

(24) Remove the message m s from I m 

(25) Pop the topmost element in Hm 

(26) else 

(27) break 



Algorithm 2: local-k Search Algorithm 

local-/c_Search(/c, m Sc , G m ) 



(1) U <— {m s \m s £ nbr(m Sc , G m ) and m s .k < k} 

(2) if \U\ < k - 1 

(3) return 0 

(4) I <- 0 

(5) while / / \U\ 

(6) l^\U\ 

(7) foreach m s £ U 

(8) if (\nbr(m 8 ,G' m ) n U\ < k - 2) 

(9) U^U\{m s ] 



(10) Find any subset M C U, s.t. \M\ = k — 1 and M U {m Sc } forms a clique 

(11) return M 

serted into the index I m using L(m Sc ), inserted into the heap 
Hm using m Sc .t + m Sc .d t , and inserted into the graph G m 
as a node. Then the edges incident upon vertex m Sc are con- 
structed in the constraint graph G m by searching the multi- 
dimensional index I m using the spatio-temporal constraint 
box of the message, i.e. B cn (m Sc ), as the range search con- 
dition. The messages whose spatio-temporal points are con- 
tained in B cn (m Sc ) are candidates for being m Sc 's neighbors 
in the constraint graph. These messages (denoted as TV in 
the pseudo code) are filtered based on whether their spatio- 
temporal constraint boxes contain L(m Sc ) . The ones that pass 
the filtering step and are different from m Sc become neigh- 
bors of m Sc . We call the subgraph that contains m Sc and its 
neighbors the focused subgraph, denoted by G m . See lines 
3-10 in the pseudo code. 

Detection — In this step we apply the local-k search 
CliqueCloak algorithm (detection) in order to find a suitable 
clique in the focused subgraph G m of G m , which contains 
m Sc and its neighbors in G m , denoted by nbr(m Sc ,G m ). In 
local- k search, we try to find a clique of size m Sc .k that in- 
cludes the message m Sc and satisfies the Clique-Cloak theo- 
rem. The pseudo code of this step is given separately in Al- 
gorithm 2 as the function local-/c_Search. Note that the local- 



k -Search function is called within Algorithm 1 (line 12) with 
parameter k set to m Sc .k. Before beginning the search, a set 
U C nbr(m Sc ,G m ) is constructed such that for each mes- 
sage m s G U, we have m s .k < k (lines 1-3). This means that 
the neighbors of m Sc whose anonymity values are higher than 
k are simply discarded from U, as they cannot be anonymized 
with a clique of size k. Then the set U is iteratively filtered 
until there is no change (lines 4-9). At each filtering step, 
each message m s G U is checked whether it has at least k — 2 
neighbors in U. If not, the message cannot be part of a clique 
that contains m Sc and has size k, thus the message is removed 
from U. After the set U is filtered, the possible cliques in 
U U {m Sc } that contain m Sc and have size k are enumer- 
ated and if one satisfying the fc-anonymity requirements is 
found, the messages in that clique are returned. Up to values 
of k = 10, (where k = 5 is considered as a good level of 
anonymity [6]) the search step does not form a bottleneck. In 
fact, the subgraph on which we perform the clique search is 
localized with respect to m Sc and it is very small compared 
to the complete constraint graph. 

Perturbation — In this step we generate the k- 
anonymized messages to be forwarded to the external LBS 
providers. If a suitable clique is found in the detection step, 
then the messages in the clique (denoted as M in the pseudo 
code) are anonymized by assigning B m (M) (i.e. the MBR 
of the spatio-temporal points of the messages in the clique), 
as their cloaking box (perturbation). Sender's identifier, mes- 
sage reference number pairs are also replaced with their se- 
cure hash value before the actual forwarding takes place. 
Then these messages are removed from the graph G m , as well 
as from the index I m and the heap H m . This step is detailed in 
the pseudo code through lines 13-19. In case a clique cannot 
be found, the message stays inside I m , Gm, and H m . It may 
be later picked up and anonymized during the processing of a 
new message or may be dropped when it expires. We discuss 
some more advanced ways of searching cliques in Section 4. 

Expiration — In this step we take care of the expired mes- 
sages. After the processing of each message, we check the ex- 
piration heap for any messages that has expired. The message 
on top of the expiration heap is checked and if its deadline has 
passed, it is removed from I m , Gm, and H m . Such a message 
cannot be anonymized and is dropped. This step is repeated 
until a message whose deadline is ahead of the current time is 
reached. Lines 20-27 of the pseudo code deals with message 
expiration. 

4 Discussions on Possible Optimizations 

In this section, first we discuss an improved CliqueCloak 
algorithm, called nbr-k search, which utilizes a different crite- 
rion in determining what kinds of cliques are searched. Then 
we discuss a variation of the CliqueCloak algorithms dis- 
cussed so far, that uses a deferred policy with regard to when 
cliques are searched. We also provide a brief discussion on 
improving the message processing time of CliqueCloak al- 
gorithms. 

When searching for a clique in the focused subgraph, it 



is essential to ensure that the newly received message, say 
m Sc , should be included in the clique. If there is a new clique 
formed due to the entrance of m Sc into the graph, it must 
contain m Sc . However, instead of searching a clique with 
size m Sc .fc, we can try to find out the biggest clique that in- 
cludes m Sc .fc, of course making sure that all messages in- 
side the clique has a k value at most equal to the size of the 
clique. There are two strong motivations behind the approach. 
First, by anonymizing a larger number of messages at once, 
it can provide higher success rate (larger number of messages 
can be successfully anonymized) which also results in better 
performance, as the graph will become less crowded. Sec- 
ond, by anonymizing messages that have smaller fc's together 
with messages that have larger fc's, it can provide higher rel- 
ative level of anonymity (meaning that the user perceived 
anonymity levels of the messages are higher than the user 
specified anonymity levels). Nbr-k search takes this approach. 
It first collects the set of k values the new message m Sc and 
its neighbors nbr(m Sc ,G m ) have, denoted as L. The k val- 
ues in L are considered in decreasing order until a clique is 
found or k becomes smaller than m Sc .k (in which case the 
search returns empty set). For each k G L considered, local- 
k -Search function is called with appropriate parameters. If a 
non-empty set is returned from the call, the search halts and 
the messages within the set are returned. 

So far we have only considered searching for cliques when 
each new message arrives. This may result in many unsuc- 
cessful searches, thus deteriorate the performance in terms of 
average time to process a message. Instead of immediately 
searching for a clique for each message, we can defer this 
processing. If a deferred message is not already anonymized 
(together with other messages) at the time of its expiration, 
we can search for a clique in order to anonymize it before 
it expires. However, this latter approach will definitely in- 
crease user perceived latency. To overcome this, we can only 
perform the clique search phase for a new message m Sc , if 
the number of neighbors it has at its arrival is larger than 
or equal to a * m Sc .k. Here, a > 1 is a system parame- 
ter that adjusts the amount of messages for which the clique 
search is deferred. Smaller values pushes the algorithm to- 
ward immediate processing. It can be set statically at com- 
pile time based on experimental studies or adaptively during 
runtime by observing the rate of successful clique searches 
with different a values. We name this variation of the algo- 
rithm as Deferred CliqueCloak and the original algorithm 
as Immediate CliqueCloak. 

There are other dimensions to CliqueCloak algorithms 
that can improve the running time performance of anonymiza- 
tion (our technical report [4] provides extended study of these 
dimensions). Here we discuss one such idea that may sig- 
nificantly improve the message processing time for extreme 
cases where constraint boxes are large. A progressive search 
techniques may be applied, such that when a message is to be 
processed for anonymization, we use a progressively increas- 
ing set of neighbor nodes as the candidate set. If a smaller 



set is not sufficient to anonymize the message, we can add 
messages whose spatio-temporal points are further away, af- 
ter each progressive step. This helps in decreasing the mes- 
sage processing time when the constraint boxes are large, 
since such large boxes result in large candidate sets, although 
most of the time anonymization can easily be performed with 
a much smaller set. 

5 Evaluation Metrics 

In this subsection we list several evaluation metrics of in- 
terest, that can be used to evaluate the effectiveness and the 
efficiency of the message perturbation engine. 

To evaluate the effectiveness of the proposed location k- 
anonymity model, an important measure is the success rate. 
Concretely, the primary goal of the cloaking algorithm is to 
maximize the number of messages perturbed successfully in 
accordance with their anonymization constraints. Success 
rate can be defined over a set S' C S of messages as the per- 
centage of messages that are successfully anonymized (per- 
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Important measures of efficiency include relative 
anonymity level, relative temporal resolution, relative spatial 
resolution, and message processing time. The first three are 
measures related with quality of service, whereas the last one 
is a performance measure. 

Relative anonymity level is a measure of the 
level of anonymity provided by the cloaking algo- 
rithm, normalized by the level of anonymity required 
by the messages. We define relative anonymity 
level over a set T' c T of perturbed messages by 
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relative anonymity level cannot go below 1 . 

Relative spatial resolution is a measure of the spa- 
tial resolution provided by the cloaking algorithm, nor- 
malized by the minimum acceptable spatial resolution de- 
fined by the spatial tolerances. We define relative spa- 
tial resolution over a set of perturbed messages T' c T 
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applied to an interval I, gives its length. Higher relative spa- 
tial resolution values imply more effective cloaking achieved 
with a smaller spatial cloaking region. 

Relative temporal resolution is a measure of the tempo- 
ral resolution provided by the cloaking algorithm, normal- 
ized by the minimum acceptable temporal resolution de- 
fined by the temporal tolerances. We define relative tempo- 
ral resolution over a set of perturbed messages T' c T by 

l^T T.m t =R{m s )eT> fetf ' tem P 0ml reS " 

olution values imply more effective cloaking achieved by a 
smaller temporal cloaking interval and thus with smaller de- 
lay due to perturbation. Relative spatial and temporal resolu- 
tions can not go below 1 . 

Message processing time is a measure of the running time 
performance of the message perturbation engine. The mes- 
sage processing time may become a critical issue, if the com- 
putational power at hand is not enough to handle the incoming 
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anonymity level range 


{5, 4, 3, 2} 


anonymity level zipf param 


0.6 


mean spatial tolerance 


100m 


variance in spatial tolerance 


40m 2 


mean temporal tolerance 


30s 


variance in temporal tolerance 


12s 2 


mean inter- wait time 


15s 


variance in inter-wait time 


6s 2 



Table 1 : Message generation parameters 



mean of car speeds 
for each road type 


{90, 60, 50}km/h 


std.dev. in car speeds 
for each road type 


{20, 15, 10} km /h 


traffic volume data 


{2916.6, 916.6, 250}per hour 



Table 2: Car movement parameters 

messages at a high rate. In Section 6, we use the average CPU 
time needed to process 10 3 messages as the message process- 
ing time. 

6 Experiments 

We break up the experimental evaluation into two com- 
ponents: the effectiveness of the personalized /c-anonymity 
model in terms of location privacy quality, and the perfor- 
mance of the location perturbation engine and algorithms 
in terms of scalability. Due to the space restriction, in 
this section, we mostly present the first set of experiments 
that demonstrates the effectiveness of our perturbation en- 
gine in terms of location privacy guarantees under different 
settings with regard to various metrics introduced in Sec- 
tion 5. A smaller set of experiments is included to give a 
summary characterization of message processing time. We 
divided the experiments into two, namely success rate and 
spatial/temporal resolution. Before presenting our experi- 
mental results, we first describe the trace generator used to 
generate realistic traces that are employed in the experiments 
and the details of our experimental setup. 

We have developed a trace generator, that simulates cars 
moving on roads and generates requests using the position 
information from the simulation. The trace generator loads 
real-world road data, available from National Mapping Di- 
vision of the United States Geological Survey (USGS) [7] in 
SDTS [13] format. We use transportation layer of 1:24K Dig- 
ital Line Graphs (DLGs) as road data. We convert the graphs 
into Scalable Vector Graphic [12] format using the Global 
Mapper [5] software and use them as input to our trace gen- 
erator. We extract three types of roads from the trace graph, 
class 1 (expressway), class 2 (arterial), and class 3 (collector). 
The generator uses real traffic volume data to calculate the to- 
tal number of cars on each road type, as described by [6]. 
Once the number of cars on each type of road is determined, 
they are randomly placed into the graph and the simulation 
begins. Cars move on the roads and take other roads when 
they reach joints. The simulator tries to keep the fraction of 
cars on each type of road constant as time progresses. The 
cars change their speeds at each joint based on a normal distri- 
bution whose parameters are also input to the trace generator. 
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k (as specified by the input message) 

Figure 2: Success rates for 
different k values 



k (anonimity level as specified by the input message) 

Figure 3: Relative anonymity 
levels for different k values 




IM/nbr IM/local DE/nbr DE/local 
different configurations 

Figure 4: Message processing 
time and success rate of different 
approaches 



variance in spatial and temporal tolerances (x mean) 

Figure 5: Success rate as a 
function of variances in spatial 
and temporal tolerances 




Figure 6: Success rate with 
respect to temporal tolerance 
with different inter- wait times 



Figure 7: Success rate with 
respect to spatial tolerance with 
different inter- wait times 



We used a map from Chamblee region of state of Georgia 
in USA to generate the trace used in this paper. The map 
covers a region of « 160km 2 . The mean speeds and standard 
deviations for each road type are given in Table 2. The traffic 
volume data is taken from [6] and is also listed in Table 1. 
These settings result in approximately 10,000 cars. The trace 
has a duration of one hour. 

Each car generates several messages during the simula- 
tion. Each message specifies an anonymity level (k value) 
from the list {5, 4, 3, 2} using a zipf parameter of 0.6, k = 5 
being the most popular. The spatial and temporal tolerance 
values of the messages are selected independently using nor- 
mal distributions whose default parameters are given in Ta- 
ble 1 . Whenever a message is generated, the originator of the 
message waits until the message is anonymized or dropped, 
after which it waits for a normally distributed amount of time, 
called the inter-wait time, whose default parameters are also 
listed in Table 1. All parameters take their default values, if 
not stated otherwise. We change many of these parameters to 
observe the behavior of the algorithms in different settings. 

For spatial points of the messages, the default settings re- 
sult in anonymizing around 70% of messages with an accu- 
racy of < 18m in 75% of the cases, which we consider to be 
very good when compared to the E-91 1 requirement of 125m 



accuracy in 67% of the cases. For temporal point of the mes- 
sages, the default parameters also result in a delay of < 10s 
in 75% of the cases and < bs in 50% of the cases. 

6.1 Success Rate 

Figure 2 shows the success rate for nbr-/c and local- k ap- 
proaches. The success rate is shown (on ?/-axis) for different 
groups of messages, each group representing messages with 
a certain k value (on x-axis). The two leftmost bars show 
the success rate for all of the messages. The wider bars show 
the actual success rate provided by the ClickCloak algorithm. 
The thinner bars represent a lower bound on the percentage 
of messages that cannot be anonymized no matter what algo- 
rithm is used. This lower bound is calculated as follows. For 
a message m s , if the set U = {m Si \m Si G S A P(m s .) G 
Bcn(m s )} has size less than m s .k, the message cannot be 
anonymized. This is because, the total number of messages 
that ever appear inside m s 's constraint box are less than m s . k. 
However, if the set U has size of at least m s .k, the message 
m s may still not be anonymized under a hypothetical optimal 
algorithm. This is because, the optimal choice may require to 
anonymize a subset of U that does not include m s , together 
with some other messages not in U . As a result, the remain- 
ing messages in U may not be sufficient to anonymize m s . It 
is not possible to design an on-line algorithm that is optimal 
in terms of success rate, due to the fact that such an algo- 
rithm will require future knowledge of messages, which is 
not known beforehand. If a trace of the messages is available, 
as in this work, the optimal success rate can be computed off- 
line. However, we are not aware of a time and space efficient 
off-line algorithm for computing the optimal success rate. As 
a result, we use a lower bound on the number of messages 
that cannot be anonyimized. 

There are three observations from Figure 2. First, the nbr- 
k approach provides around 15% better average success rate 
than local- k. Second, the best average success rate achieved 
is around 70. Out of the 30% dropped messages, at least 65% 
of them cannot be anonymized, meaning that in the worst 
case remaining 10% of all messages are dropped due to non- 
optimality of the algorithm with respect to success rate. If we 
knew a way to construct the optimal algorithm (with a reason- 
able time and space complexity) given full knowledge of the 
trace, we could have got a better bound. Last, messages with 
larger k values are harder to anonymize. The success rate for 
messages with k = 2 is around 30% higher than the success 
rate for messages with k = 5. 

Figure 3 shows the relative anonymity level for nbr-k and 
local- A: approaches. The relative anonymity level is shown 
(on y-Sixis) for different groups of messages, each group rep- 
resenting messages with a certain k value (on x-axis). Nbr-fc 
shows a relative anonymity level of 1.7 for messages with 
k — 2, meaning that on the average these messages are 
anonymized with k = 3.4 by the algorithm. Local- k shows 
a lower relative anonymity level of 1.4 for messages with 
k = 2. This gap between the two approaches vanishes for 
messages with k = 5, since both of the algorithms do not 
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Figure 8: Relative temporal and spatial resolution distributions 

attempt to search cliques of sizes larger than the maximum 
of the k values specified by the messages. The gap in relative 
anonymity level between nbr-k and local- A: shows that the for- 
mer approach is able to anonymize messages with smaller k 
values together with the ones with higher k values. This is 
particularly good for messages with higher k values, as they 
are harder to anonymize. This also explains why nbr-k results 
in better success rate. 

Figure 4 plots the average success rate (y-axis on the left 
side) and the message processing time (?/-axis on the right 
side) for nbr-k and local- k search approaches with imme- 
diate or deferred processing mode. For deferred processing 
mode a is taken as 1.4 (as it gave the highest success rate). 
Other than the immediate approach providing better success 
rate than the deferred approach, the surprising observation 
from the figure is that, deferred approach does not provide 
improvement in terms of message processing time. Figure 4 
also shows (above the x-axis) the number of times clique 
search is performed for different approaches. Although the 
deferred approach results in slightly higher message process- 
ing time, it decreases the number of times the clique search is 
performed around 50% (for nbr-k). Here is the reason that the 
deferred approach still performs worse in terms of total pro- 
cessing time: For k < 10 the index update dominates the cost 
of processing the message and the deferred approach results 
in a more crowded index. However, the deferred approach 
is promising in terms of message processing time, for cases 
where k values are really large (thus clique search dominates 
the cost). Another potential enhancement is to design a more 
efficient multi dimensional index to replace the in-memory 
R* tree. 

Figure 5 plots the success rate for different mean temporal 
tolerances and different variances in temporal and spatial tol- 
erances. It shows that the algorithm is much less sensitive to 
the changes in the variances of the spatial and temporal toler- 
ances than the mean temporal tolerance. For instance, when 
the mean temporal tolerance is 60s, changing the variance 
in both spatial and temporal tolerances from 0.2 times their 
means to 1.6 times their means only decreases the success 
rate from 80 to 75; whereas decreasing the mean temporal 
tolerance from 60s to 15s decreases the success rate by ap- 
proximately 40% of its success rate (for instance from 80 to 
50 when variances are equal to 0.2 times their means). 

Figure 6 plots the average success rate as a function of 



mean inter- wait time and mean temporal tolerance. Similarly, 
Figure 7 plots the average success rate as a function of mean 
inter- wait time and mean spatial tolerance. For both of the fig- 
ures, the variances are always set to 0.4 times the means. We 
observe that, the smaller the inter-wait time, the higher the 
success rate. For smaller values of the temporal and spatial 
tolerances, the decrease in inter- wait time becomes more im- 
portant, in terms of keeping the success rate high. When the 
inter- wait time is high, we have a lower rate of messages com- 
ing into the system. Thus, it becomes harder to anonymize 
messages, as the constraint graph becomes sparser. Both spa- 
tial and temporal tolerances has tremendous effect on the 
success rate. Although high success rates (around 85) are 
achieved with high temporal and spatial tolerances, as we 
will show in the next section, the relative temporal and spa- 
tial resolutions are much larger than 1 in such cases, meaning 
that the system assigns much smaller spatio-temporal cloak- 
ing boxes to the messages compared to the constraint boxes. 

6.2 Spatial/Temporal Resolution 

In Section 6.1, we have showed that one way to improve 
success rate is to increase the spatial and temporal tolerance 
values specified by the messages. In this section, we show 
that our CliqueCloak algorithms have the nice property that, 
for most of the anonymized messages, the cloaking box gen- 
erated by the algorithm is much smaller than the constraint 
box of the received message (specified by the tolerance val- 
ues), resulting in higher relative spatial and temporal resolu- 
tions. 

Figure 8(a) plots the frequency distribution (y-axis) of the 
relative temporal resolutions (x-axis) of the anonymized mes- 
sages. Figure 8 shows that in 75% of the cases the provided 
relative temporal resolution is > 3.25, thus an average tem- 
poral accuracy of roughly < 10s (recalling that the default 
mean temporal tolerance was 30s). For 50% of the cases it is 
> 5.95 and for 25% of the cases it is > 17.25. This points 
out that, the observed performance with regard to temporal 
resolutions is much better than the worst case specified by the 
temporal tolerances. 

Figure 8(b) plots the frequency distribution (y-axis) of the 
relative spatial resolutions (x-axis) of the anonymized mes- 
sages. Figure 8 shows that in 75% of the cases the provided 
relative spatial resolution is > 5.85, thus an average spatial 
accuracy of roughly < 18m (recalling that the default mean 
spatial tolerance was 100m). In 50% of the cases it is > 7.75 
and for 25% of the cases it is > 12.55. This points out that, 
the observed performance with regard to spatial resolutions is 
much better than the worst case specified by the spatial toler- 
ances. 

7 Related Work 

Previous work on location privacy has mostly focused on 
a policy-based approach [3, 10], specialized in telematics or 
telecommunication domain. Users may use the policies to 
specify the privacy preferences. These policies serve as a lo- 
cation information sharing agreement on which data can be 



collected and shared, when and for what purpose the data can 
be used, and how and to whom it can be distributed. Mobile 
users have to trust the LBSs that private location information 
is adequately protected. Another approach to location privacy 
is location fc-anonymity based approach, which depersonal- 
izes data through perturbation techniques before forwarding 
it to the LBS. Location ^-anonymity is first studied in [6]. Its 
location perturbation is performed by the quadtree-based al- 
gorithm executing spatial and temporal cloaking. However, 
this work suffers from several drawbacks. First, it assumes 
a system-wide static k value for all mobile users, which hin- 
ders the service quality for those mobile nodes whose privacy 
requirements can be satisfied using smaller k values. Fur- 
thermore, this assumption is unrealistic in practice as mobile 
users tend to have varying privacy protection requirements 
under different contexts and on different subjects. Second, 
their approach fails to provide any quality of service guaran- 
tees with respect to the sizes of the cloaking boxes produced. 
This is because, the quadtree-based algorithm anonymizes 
the messages by dividing the quadtree cells until the num- 
ber of messages in each cell falls below k and by returning 
the previous quadrant for each cell as the spatial cloaking box 
of the messages under that cell. In comparison, our unified 
framework for location ^-anonymity captures the desired de- 
gree of privacy and quality on per-user base, supporting mo- 
bile users with diverse context-dependent location privacy re- 
quirements. Our message perturbation engine can anonymize 
a stream of incoming messages through the use of efficient 
CliqueCloak algorithms, where each message can specify an 
independent k value, as well as customized spatial and tem- 
poral tolerance values to restrict the size of the cloaking box 
produced as a result of perturbation. 

Samarati and Sweeney have developed a ^-anonymity 
model [14] for protecting data privacy and a set of general- 
ization and suppression techniques [15] for safeguarding the 
anonymity of individuals whose information is recorded in 
database tables. Our work, although in a different context, 
can be viewed as location perturbation of the messages sent 
by mobile nodes communicating with LBS providers through 
a trusted anonymity server. 

8 Conclusion 

We have proposed a personalized ^-anonymity model for 
providing location privacy. Our model allows each mobile 
user to define, and modify this definition at the granularity of 
single messages, a minimum anonymity level requirement, as 
well as upper bounds on the inaccuracy to be introduced by 
the cloaking algorithm in temporal and spatial dimensions. 
We have developed a novel message perturbation engine to 
implement this model, that can effectively anonymize mes- 
sages sent by the mobile nodes, in accordance with location 
^-anonymity, while satisfying the privacy and quality require- 
ments of the users. Several spatio-temporal cloaking algo- 
rithms, called CliqueCloak algorithms, are proposed to work 
as a part of the perturbation engine. We experimentally stud- 
ied the behavior of our CliqueCloak algorithms under vari- 



ous conditions, using realistic location data synthetically gen- 
erated using real road maps and traffic volume data. 
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